Multi-nucleotide variants (MNVs) refer to the coexistence of two or more nucleotide variants on the same haplotype in an individual, which can produce different amino acids from constituent single-nucleotide variants (SNVs), particularly in the start/stop codons. However, existing tools have some limitations for genome-wide detection and annotation of MNVs, especially for complex MNVs and ones in non-coding regions. Thus, we first develop MNVAnno, a toolbox for rapid genome-wide identification and detailed annotation of complex MNVs.
Subsequently, for fully recognizing MNVs in human, we collected genotype data from 1000G, GTEx, TCGA, UK Biobank WES and UK Biobank chip, and then identified and annotated MNVs by MNVAnno. After integrating newly identified MNVs and existing MNVs sourced from GnomAD, we constructed the largest human MNV list, encompassing 8,199,654 MNVs (hg38 version).